Skip to content

Fix view_inference spec deletion bug + add abstraction level examples#57

Merged
ChuyueSun merged 13 commits into
mainfrom
new-workflow
Nov 7, 2025
Merged

Fix view_inference spec deletion bug + add abstraction level examples#57
ChuyueSun merged 13 commits into
mainfrom
new-workflow

Conversation

@ChuyueSun

@ChuyueSun ChuyueSun commented Nov 6, 2025

Copy link
Copy Markdown
Owner
  • Implement surgical insertion in view_inference (prevents spec keyword deletion)
  • Add pattern detection for all 5 View types
  • Create 4 educational examples for abstraction level teaching
  • Update spec_inference with smart example selection
  • Validated on 13 benchmarks: 84% success rate, 100% spec preservation

Primary bug FIXED: bitmap_2_todo achieves 100% verification (V=8/8)


Note

Overhauls view/spec inference with surgical insertion and abstraction-aware examples, adds repair-round timeouts, experiments/analysis tooling, CI/pre-commit, and comprehensive docs with VeriStruct rebrand.

  • Core (verification):
    • View Inference: Implement surgical insertion; detect existing spec fn view/impl View TODOs; robust brace matching; insert body/impl safely with delimiter/safety checks.
    • Spec Inference: Add low-level pattern detection (bit-vectors/packed structs); prioritize concrete postcondition examples; inject targeted guidance; syntax fixups for spec clauses.
    • Proof/Repairs: Regex-based syntax fixes; strengthened invariant/proof guidance; multiple modules streamlined; improved safety checks.
  • Repair orchestration:
    • Add repair round timeout plumbing (config, main, RepairRegistry) with round start/elapsed checks; tests and verifier script.
  • CLI/Runner updates:
    • Enhance run_agent.py, run_bench*.py, run_all_benchmarks.py (parallelism, no-cache, UX/messages); add analyze_results.py (root) and analysis tooling.
  • Experiments:
    • Add experiments/ framework (runner, analyzer, quick script, sample corpus, results reporting & visualizations).
  • CI/Tooling:
    • Add pre-commit config, flake8/markdownlint, GitHub Actions workflow, shellcheck config, ignore patterns.
  • Docs & Rebrand:
    • Rebrand to VeriStruct; major README and documentation additions (tutorials, modules, planner/workflow, examples); config/setup guides.
  • Misc:
    • Add lemma macros (bit.rs); numerous small refactors, logging, and robustness improvements.

Written by Cursor Bugbot for commit 6090459. This will update automatically on new commits. Configure here.

- Implement surgical insertion in view_inference (prevents spec keyword deletion)
- Add pattern detection for all 5 View types
- Create 4 educational examples for abstraction level teaching
- Update spec_inference with smart example selection
- Validated on 13 benchmarks: 84% success rate, 100% spec preservation

Primary bug FIXED: bitmap_2_todo achieves 100% verification (V=8/8)
- Remove reflection and analysis markdown files
- Add .gitignore patterns to prevent future tracking
- Keep files locally for reference but not in repository

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Broken checkpoint best-tracking overwrites better scores

Incorrect checkpoint best tracking logic that prevents proper score comparison. After retrieving checkpoint_best_score and checkpoint_best_code from context (lines 732-735), the code initializes them if they're None (lines 738-745). However, this initialization logic never compares the new best_score against the existing checkpoint_best_score to determine which is actually better. The code unconditionally overwrites both values with the current best_code and best_score on lines 747-748, even if the checkpoint best was superior. This means the global best tracking will always use the most recent result rather than the best result across all attempts, defeating the purpose of checkpoint best tracking.

The correct logic should compare scores and only update if the new score is better (or if checkpoint_best_score is None). This bug causes the system to lose track of the best code encountered during the workflow.

src/modules/view_inference.py#L731-L749

# Use cache only for first attempt
responses = self._get_llm_responses(
instruction,
code,
examples,
retry_attempt=retry_attempt,
use_cache=True,
context=context, # Pass context for tracking
# use_cache=(retry_attempt == 0)
)
if not responses and retry_attempt == max_retries - 1:
return code
safe_responses.extend(self._process_responses(responses, original_code))
if safe_responses:
self.logger.info(
f"Found {len(safe_responses)} safe responses after {retry_attempt + 1} attempts"
)

Fix in Cursor Fix in Web


@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/modules/view_inference.py
- Remove TIMEOUT_IMPLEMENTATION_SUMMARY.txt
- Remove benchmark_summary_*.txt
- Remove check_benchmark_status.sh
- Update .gitignore to prevent future tracking
Comment thread src/modules/view_inference.py Outdated
Comment thread src/modules/view_inference.py
Comment thread src/modules/view_inference.py Outdated
- Add _find_matching_brace helper that properly handles strings, comments, and nested braces
- Update has_spec_fn_view to use improved brace matching for accurate function body extraction
- Update has_view_trait_with_todo to correctly locate View trait implementations
- Improve _extract_view_impl and _extract_view_implementation for more reliable parsing
- Fix pre-commit workflow formatting

These changes significantly improve the robustness of parsing Verus code when detecting and extracting view implementations, especially in complex code with nested braces or comments.
Comment thread src/modules/view_inference.py
…ction

- Fixed insert_view_body to detect and strip minimum indentation before adding 8 spaces
  This prevents double-indentation when LLM-generated view bodies already contain proper indentation
- Improved detect_spec_fn_view_todo to search all impl blocks instead of requiring struct and impl adjacency
  Makes detection more robust for various code layouts
fn main() {}


+

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Dangling Plus Line Causes Rust Compile Error

The file ends with an extra "+\n" line after the closing brace and main function. This appears to be a diff artifact that was accidentally committed as part of the file content. This will cause a compilation error as it's not valid Rust syntax.

Fix in Cursor Fix in Web

Major Changes:
- Rename project from VerusAgent to VeriStruct throughout
  - Updated README title and all references
  - Updated script banners and help text
  - Consistent with paper title (VeriStruct)

Script Consistency Improvements:
- run_agent.py: Enhanced help text with examples
- run_bench.py: Added comprehensive examples and clarifications
- run_bench_no_cache.py: Improved documentation
- run_all_benchmarks.py: Added argparse support and --configs argument

Documentation Enhancements:
- Added quick reference table for script selection
- Added design rationale section explaining argument patterns
- Fixed all usage examples with correct arguments
- Updated repository URL to ChuyueSun/VeriStruct
- Enhanced project structure with inline argument hints

Argument Consistency:
- run_agent.py: --test-file (path) + --config (singular)
- run_bench.py: --benchmark (name) + --configs (plural)
- run_bench_no_cache.py: --benchmark (name) + --configs (plural)
- run_all_benchmarks.py: --configs (plural)

All changes maintain backward compatibility and pass linting checks.
Comment thread src/prompts/verus_common.md Outdated
Updated all remaining references to use VeriStruct consistently:

Documentation:
- All tutorial files (README, 01-04)
- Technical documentation (workflow, planner, modules)
- Example documentation (bitmap, ringbuffer)
- Repair module documentation

Source Code:
- src/main.py
- src/modules/baserepair.py
- src/modules/statistics_collector.py
- src/modules/progress_logger.py
- src/modules/utils.py
- src/modules/repair_registry.py

Experiments:
- experiments/README.md
- experiments/experiment_runner.py
- experiments/analyze_results.py
- experiments/run_quick_experiment.sh
- experiments/sample_corpus.json

Root Files:
- README_BASELINE.md
- README_modules.md
- YOUR_CONFIG_SETUP.md
- run_baseline_bench.py
- run_repair_effectiveness_experiment.py
- run_all_benchmarks.py
- customize.fish

LaTeX Files:
- tex/pseudocode.tex
- tex/main_execution.tex

Configuration:
- src/configs/README.md

All changes verified with grep - zero instances of 'VerusAgent' remain.
Project now uses VeriStruct as the sole name throughout.
- Modified verus_common.md: Changed instruction from 'use vector.len()' to
  'use vector@.len()' for consistency with other view operations
- Updated spec_inference.py: Added explicit guidance to use v@.len() for
  vectors/collections in spec contexts
- Added len_syntax_analysis.md: Documentation showing both v.len() and
  v@.len() work correctly in Verus, but standardizing on v@.len() for
  consistency with other @ operations (v@[i], v@.field)

Both syntaxes verify successfully, but v@.len() provides:
- Consistency with other view operations
- Explicit indication of spec-level view usage
- Clearer mental model for specifications
@ChuyueSun ChuyueSun merged commit d2f79a0 into main Nov 7, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant